Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells449428
Missing cells (%)8.4%8.0%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Sex is highly overall correlated with SurvivedAlert not present in this datasetHigh correlation
Survived is highly overall correlated with SexAlert not present in this datasetHigh correlation
Age has 98 (22.0%) missing values Age has 89 (20.0%) missing values Missing
Cabin has 349 (78.3%) missing values Cabin has 337 (75.6%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 303 (67.9%) zeros SibSp has 309 (69.3%) zeros Zeros
Parch has 332 (74.4%) zeros Parch has 341 (76.5%) zeros Zeros
Fare has 9 (2.0%) zeros Fare has 7 (1.6%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2025-01-20 16:51:20.2843632025-01-20 16:51:22.706938
Analysis finished2025-01-20 16:51:22.7036942025-01-20 16:51:25.110202
Duration2.42 seconds2.4 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean435.42377447.66592
 Dataset ADataset B
Minimum17
Maximum889891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:51:25.325883image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum17
5-th percentile36.551.5
Q1216.25234.25
median434437
Q3653.75669.75
95-th percentile842840.75
Maximum889891
Range888884
Interquartile range (IQR)437.5435.5

Descriptive statistics

 Dataset ADataset B
Standard deviation256.90145254.94299
Coefficient of variation (CV)0.590003290.56949385
Kurtosis-1.1630314-1.2096078
Mean435.42377447.66592
Median Absolute Deviation (MAD)219.5218
Skewness0.0500122320.0090972841
Sum194199199659
Variance65998.35764995.926
MonotonicityNot monotonicNot monotonic
2025-01-20T16:51:25.488502image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
686 1
 
0.2%
806 1
 
0.2%
414 1
 
0.2%
887 1
 
0.2%
79 1
 
0.2%
793 1
 
0.2%
308 1
 
0.2%
260 1
 
0.2%
202 1
 
0.2%
525 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
55 1
 
0.2%
366 1
 
0.2%
290 1
 
0.2%
57 1
 
0.2%
156 1
 
0.2%
42 1
 
0.2%
351 1
 
0.2%
682 1
 
0.2%
875 1
 
0.2%
795 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
6 1
0.2%
7 1
0.2%
10 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
ValueCountFrequency (%)
7 1
0.2%
8 1
0.2%
12 1
0.2%
15 1
0.2%
16 1
0.2%
18 1
0.2%
21 1
0.2%
22 1
0.2%
25 1
0.2%
27 1
0.2%
ValueCountFrequency (%)
7 1
0.2%
8 1
0.2%
12 1
0.2%
15 1
0.2%
16 1
0.2%
18 1
0.2%
21 1
0.2%
22 1
0.2%
25 1
0.2%
27 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
6 1
0.2%
7 1
0.2%
10 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
274 
1
172 
0
282 
1
164 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row00
2nd row01
3rd row01
4th row10
5th row00

Common Values

ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Length

2025-01-20T16:51:25.601435image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-01-20T16:51:25.653226image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:25.690403image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring characters

ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
251 
1
102 
2
93 
3
248 
1
109 
2
89 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row23
3rd row22
4th row21
5th row32

Common Values

ValueCountFrequency (%)
3 251
56.3%
1 102
22.9%
2 93
 
20.9%
ValueCountFrequency (%)
3 248
55.6%
1 109
24.4%
2 89
 
20.0%

Length

2025-01-20T16:51:25.751304image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-01-20T16:51:25.806787image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:25.853263image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
3 251
56.3%
1 102
22.9%
2 93
 
20.9%
ValueCountFrequency (%)
3 248
55.6%
1 109
24.4%
2 89
 
20.0%

Most occurring characters

ValueCountFrequency (%)
3 251
56.3%
1 102
22.9%
2 93
 
20.9%
ValueCountFrequency (%)
3 248
55.6%
1 109
24.4%
2 89
 
20.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 251
56.3%
1 102
22.9%
2 93
 
20.9%
ValueCountFrequency (%)
3 248
55.6%
1 109
24.4%
2 89
 
20.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 251
56.3%
1 102
22.9%
2 93
 
20.9%
ValueCountFrequency (%)
3 248
55.6%
1 109
24.4%
2 89
 
20.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 251
56.3%
1 102
22.9%
2 93
 
20.9%
ValueCountFrequency (%)
3 248
55.6%
1 109
24.4%
2 89
 
20.0%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:51:26.190881image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length5350
Mean length27.46412626.524664
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1224911830
Distinct characters5959
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowJohansson, Mr. Karl JohanAdahl, Mr. Mauritz Nils Martin
2nd rowCunningham, Mr. Alfred FlemingConnolly, Miss. Kate
3rd rowMontvila, Rev. JuozasRugg, Miss. Emily
4th rowCaldwell, Master. Alden GatesWilliams, Mr. Charles Duane
5th rowSage, Miss. Stella AnnaTurpin, Mrs. William John Robert (Dorothy Ann Wonnacott)
ValueCountFrequency (%)
mr 254
 
13.8%
miss 89
 
4.8%
mrs 70
 
3.8%
william 30
 
1.6%
master 22
 
1.2%
john 22
 
1.2%
henry 19
 
1.0%
charles 15
 
0.8%
george 13
 
0.7%
james 13
 
0.7%
Other values (887) 1292
70.3%
ValueCountFrequency (%)
mr 261
 
14.5%
miss 100
 
5.6%
mrs 57
 
3.2%
william 27
 
1.5%
john 24
 
1.3%
master 18
 
1.0%
henry 16
 
0.9%
thomas 14
 
0.8%
mary 12
 
0.7%
charles 11
 
0.6%
Other values (896) 1254
69.9%
2025-01-20T16:51:26.717568image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1395
 
11.4%
r 1032
 
8.4%
e 860
 
7.0%
a 852
 
7.0%
s 680
 
5.6%
i 669
 
5.5%
n 650
 
5.3%
M 564
 
4.6%
l 530
 
4.3%
o 513
 
4.2%
Other values (49) 4504
36.8%
ValueCountFrequency (%)
1350
 
11.4%
r 932
 
7.9%
a 837
 
7.1%
e 798
 
6.7%
s 665
 
5.6%
n 656
 
5.5%
i 641
 
5.4%
M 563
 
4.8%
l 517
 
4.4%
o 506
 
4.3%
Other values (49) 4365
36.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12249
100.0%
ValueCountFrequency (%)
(unknown) 11830
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1395
 
11.4%
r 1032
 
8.4%
e 860
 
7.0%
a 852
 
7.0%
s 680
 
5.6%
i 669
 
5.5%
n 650
 
5.3%
M 564
 
4.6%
l 530
 
4.3%
o 513
 
4.2%
Other values (49) 4504
36.8%
ValueCountFrequency (%)
1350
 
11.4%
r 932
 
7.9%
a 837
 
7.1%
e 798
 
6.7%
s 665
 
5.6%
n 656
 
5.5%
i 641
 
5.4%
M 563
 
4.8%
l 517
 
4.4%
o 506
 
4.3%
Other values (49) 4365
36.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12249
100.0%
ValueCountFrequency (%)
(unknown) 11830
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1395
 
11.4%
r 1032
 
8.4%
e 860
 
7.0%
a 852
 
7.0%
s 680
 
5.6%
i 669
 
5.5%
n 650
 
5.3%
M 564
 
4.6%
l 530
 
4.3%
o 513
 
4.2%
Other values (49) 4504
36.8%
ValueCountFrequency (%)
1350
 
11.4%
r 932
 
7.9%
a 837
 
7.1%
e 798
 
6.7%
s 665
 
5.6%
n 656
 
5.5%
i 641
 
5.4%
M 563
 
4.8%
l 517
 
4.4%
o 506
 
4.3%
Other values (49) 4365
36.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12249
100.0%
ValueCountFrequency (%)
(unknown) 11830
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1395
 
11.4%
r 1032
 
8.4%
e 860
 
7.0%
a 852
 
7.0%
s 680
 
5.6%
i 669
 
5.5%
n 650
 
5.3%
M 564
 
4.6%
l 530
 
4.3%
o 513
 
4.2%
Other values (49) 4504
36.8%
ValueCountFrequency (%)
1350
 
11.4%
r 932
 
7.9%
a 837
 
7.1%
e 798
 
6.7%
s 665
 
5.6%
n 656
 
5.5%
i 641
 
5.4%
M 563
 
4.8%
l 517
 
4.4%
o 506
 
4.3%
Other values (49) 4365
36.9%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
285 
female
161 
male
288 
female
158 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.72197314.7085202
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21062100
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowmalefemale
3rd rowmalefemale
4th rowmalemale
5th rowfemalefemale

Common Values

ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%
ValueCountFrequency (%)
male 288
64.6%
female 158
35.4%

Length

2025-01-20T16:51:26.816266image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-01-20T16:51:26.876831image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:26.914026image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%
ValueCountFrequency (%)
male 288
64.6%
female 158
35.4%

Most occurring characters

ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2100
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2100
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2100
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7676
Distinct (%)21.8%21.3%
Missing9889
Missing (%)22.0%20.0%
Infinite00
Infinite (%)0.0%0.0%
Mean29.32853429.771709
 Dataset ADataset B
Minimum0.420.42
Maximum7474
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:51:27.023465image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.42
5-th percentile3.354
Q11920
median2828
Q33839
95-th percentile5756
Maximum7474
Range73.5873.58
Interquartile range (IQR)1919

Descriptive statistics

 Dataset ADataset B
Standard deviation14.9900714.415345
Coefficient of variation (CV)0.511108740.48419609
Kurtosis0.03506182-0.081335331
Mean29.32853429.771709
Median Absolute Deviation (MAD)99
Skewness0.33727630.322972
Sum10206.3310628.5
Variance224.70221207.80217
MonotonicityNot monotonicNot monotonic
2025-01-20T16:51:27.304856image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
25 16
 
3.6%
21 15
 
3.4%
30 15
 
3.4%
18 15
 
3.4%
32 14
 
3.1%
19 13
 
2.9%
36 12
 
2.7%
22 12
 
2.7%
28 11
 
2.5%
35 11
 
2.5%
Other values (66) 214
48.0%
(Missing) 98
22.0%
ValueCountFrequency (%)
18 18
 
4.0%
24 16
 
3.6%
22 15
 
3.4%
25 13
 
2.9%
21 12
 
2.7%
19 11
 
2.5%
28 11
 
2.5%
30 11
 
2.5%
27 10
 
2.2%
20 9
 
2.0%
Other values (66) 231
51.8%
(Missing) 89
 
20.0%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 2
 
0.4%
1 4
0.9%
2 6
1.3%
3 4
0.9%
4 8
1.8%
5 2
 
0.4%
6 1
 
0.2%
7 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 3
0.7%
2 5
1.1%
3 3
0.7%
4 5
1.1%
5 2
 
0.4%
6 1
 
0.2%
7 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 3
0.7%
2 5
1.1%
3 3
0.7%
4 5
1.1%
5 2
 
0.4%
6 1
 
0.2%
7 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 2
 
0.4%
1 4
0.9%
2 6
1.3%
3 4
0.9%
4 8
1.8%
5 2
 
0.4%
6 1
 
0.2%
7 2
 
0.4%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.515695070.49775785
 Dataset ADataset B
Minimum00
Maximum88
Zeros303309
Zeros (%)67.9%69.3%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:51:27.413308image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile22
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.10299911.0928756
Coefficient of variation (CV)2.13885922.1955968
Kurtosis20.12904720.844338
Mean0.515695070.49775785
Median Absolute Deviation (MAD)00
Skewness3.93163164.0005197
Sum230222
Variance1.2166071.194377
MonotonicityNot monotonicNot monotonic
2025-01-20T16:51:27.491447image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 303
67.9%
1 109
 
24.4%
2 13
 
2.9%
3 8
 
1.8%
4 6
 
1.3%
8 4
 
0.9%
5 3
 
0.7%
ValueCountFrequency (%)
0 309
69.3%
1 104
 
23.3%
2 12
 
2.7%
3 8
 
1.8%
4 7
 
1.6%
8 4
 
0.9%
5 2
 
0.4%
ValueCountFrequency (%)
0 303
67.9%
1 109
 
24.4%
2 13
 
2.9%
3 8
 
1.8%
4 6
 
1.3%
5 3
 
0.7%
8 4
 
0.9%
ValueCountFrequency (%)
0 309
69.3%
1 104
 
23.3%
2 12
 
2.7%
3 8
 
1.8%
4 7
 
1.6%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 309
69.3%
1 104
 
23.3%
2 12
 
2.7%
3 8
 
1.8%
4 7
 
1.6%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 303
67.9%
1 109
 
24.4%
2 13
 
2.9%
3 8
 
1.8%
4 6
 
1.3%
5 3
 
0.7%
8 4
 
0.9%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.421524660.35201794
 Dataset ADataset B
Minimum00
Maximum66
Zeros332341
Zeros (%)74.4%76.5%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:51:27.564525image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q310
95-th percentile22
Maximum66
Range66
Interquartile range (IQR)10

Descriptive statistics

 Dataset ADataset B
Standard deviation0.864731380.75212535
Coefficient of variation (CV)2.05143722.1366109
Kurtosis9.711890312.371668
Mean0.421524660.35201794
Median Absolute Deviation (MAD)00
Skewness2.73625062.9482384
Sum188157
Variance0.747760370.56569255
MonotonicityNot monotonicNot monotonic
2025-01-20T16:51:27.642740image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 332
74.4%
1 60
 
13.5%
2 46
 
10.3%
5 3
 
0.7%
4 3
 
0.7%
6 1
 
0.2%
3 1
 
0.2%
ValueCountFrequency (%)
0 341
76.5%
1 65
 
14.6%
2 35
 
7.8%
4 2
 
0.4%
6 1
 
0.2%
3 1
 
0.2%
5 1
 
0.2%
ValueCountFrequency (%)
0 332
74.4%
1 60
 
13.5%
2 46
 
10.3%
3 1
 
0.2%
4 3
 
0.7%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 341
76.5%
1 65
 
14.6%
2 35
 
7.8%
3 1
 
0.2%
4 2
 
0.4%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 341
76.5%
1 65
 
14.6%
2 35
 
7.8%
3 1
 
0.2%
4 2
 
0.4%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 332
74.4%
1 60
 
13.5%
2 46
 
10.3%
3 1
 
0.2%
4 3
 
0.7%
5 3
 
0.7%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct369385
Distinct (%)82.7%86.3%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:51:28.073147image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.66816146.6995516
Min length34

Characters and Unicode

 Dataset ADataset B
Total characters29742988
Distinct characters3532
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique311337 ?
Unique (%)69.7%75.6%

Sample

 Dataset ADataset B
1st row347063C 7076
2nd row239853370373
3rd row211536C.A. 31026
4th row248738PC 17597
5th rowCA. 234311668
ValueCountFrequency (%)
pc 27
 
4.8%
c.a 12
 
2.1%
a/5 9
 
1.6%
ca 9
 
1.6%
soton/oq 6
 
1.1%
a/4 5
 
0.9%
w./c 5
 
0.9%
ston/o2 4
 
0.7%
ston/o 4
 
0.7%
2 4
 
0.7%
Other values (389) 476
84.8%
ValueCountFrequency (%)
pc 29
 
5.2%
c.a 12
 
2.2%
a/5 8
 
1.4%
ca 8
 
1.4%
f.c.c 5
 
0.9%
2 5
 
0.9%
ston/o 5
 
0.9%
sc/paris 5
 
0.9%
w./c 4
 
0.7%
1601 4
 
0.7%
Other values (403) 473
84.8%
2025-01-20T16:51:28.614637image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 390
13.1%
1 335
11.3%
2 293
9.9%
7 234
 
7.9%
4 232
 
7.8%
6 206
 
6.9%
5 202
 
6.8%
0 198
 
6.7%
9 160
 
5.4%
8 139
 
4.7%
Other values (25) 585
19.7%
ValueCountFrequency (%)
3 394
13.2%
1 361
12.1%
2 281
9.4%
4 243
8.1%
7 240
8.0%
6 207
 
6.9%
5 196
 
6.6%
0 196
 
6.6%
9 165
 
5.5%
8 141
 
4.7%
Other values (22) 564
18.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2974
100.0%
ValueCountFrequency (%)
(unknown) 2988
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 390
13.1%
1 335
11.3%
2 293
9.9%
7 234
 
7.9%
4 232
 
7.8%
6 206
 
6.9%
5 202
 
6.8%
0 198
 
6.7%
9 160
 
5.4%
8 139
 
4.7%
Other values (25) 585
19.7%
ValueCountFrequency (%)
3 394
13.2%
1 361
12.1%
2 281
9.4%
4 243
8.1%
7 240
8.0%
6 207
 
6.9%
5 196
 
6.6%
0 196
 
6.6%
9 165
 
5.5%
8 141
 
4.7%
Other values (22) 564
18.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2974
100.0%
ValueCountFrequency (%)
(unknown) 2988
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 390
13.1%
1 335
11.3%
2 293
9.9%
7 234
 
7.9%
4 232
 
7.8%
6 206
 
6.9%
5 202
 
6.8%
0 198
 
6.7%
9 160
 
5.4%
8 139
 
4.7%
Other values (25) 585
19.7%
ValueCountFrequency (%)
3 394
13.2%
1 361
12.1%
2 281
9.4%
4 243
8.1%
7 240
8.0%
6 207
 
6.9%
5 196
 
6.6%
0 196
 
6.6%
9 165
 
5.5%
8 141
 
4.7%
Other values (22) 564
18.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2974
100.0%
ValueCountFrequency (%)
(unknown) 2988
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 390
13.1%
1 335
11.3%
2 293
9.9%
7 234
 
7.9%
4 232
 
7.8%
6 206
 
6.9%
5 202
 
6.8%
0 198
 
6.7%
9 160
 
5.4%
8 139
 
4.7%
Other values (25) 585
19.7%
ValueCountFrequency (%)
3 394
13.2%
1 361
12.1%
2 281
9.4%
4 243
8.1%
7 240
8.0%
6 207
 
6.9%
5 196
 
6.6%
0 196
 
6.6%
9 165
 
5.5%
8 141
 
4.7%
Other values (22) 564
18.9%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct182184
Distinct (%)40.8%41.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean31.59114332.636071
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros97
Zeros (%)2.0%1.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:51:28.751227image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.225
Q17.90317.8958
median15.02514.4271
Q33132.087475
95-th percentile112.67708108.9
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)23.096924.191675

Descriptive statistics

 Dataset ADataset B
Standard deviation46.27047651.97986
Coefficient of variation (CV)1.46466611.592712
Kurtosis32.04115735.72501
Mean31.59114332.636071
Median Absolute Deviation (MAD)7.50216.7375
Skewness4.55893255.0330051
Sum14089.6514555.688
Variance2140.9572701.9058
MonotonicityNot monotonicNot monotonic
2025-01-20T16:51:28.919623image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 20
 
4.5%
13 19
 
4.3%
7.8958 19
 
4.3%
26 18
 
4.0%
7.75 15
 
3.4%
7.25 10
 
2.2%
10.5 9
 
2.0%
0 9
 
2.0%
7.775 8
 
1.8%
7.2292 8
 
1.8%
Other values (172) 311
69.7%
ValueCountFrequency (%)
8.05 24
 
5.4%
7.8958 19
 
4.3%
7.75 18
 
4.0%
13 18
 
4.0%
26 16
 
3.6%
10.5 12
 
2.7%
26.55 10
 
2.2%
7.925 9
 
2.0%
7.775 8
 
1.8%
7.2292 8
 
1.8%
Other values (174) 304
68.2%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.05 2
 
0.4%
7.125 2
 
0.4%
ValueCountFrequency (%)
0 7
1.6%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 2
 
0.4%
7.0542 2
 
0.4%
ValueCountFrequency (%)
0 7
1.6%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 2
 
0.4%
7.0542 2
 
0.4%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.05 2
 
0.4%
7.125 2
 
0.4%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8091
Distinct (%)82.5%83.5%
Missing349337
Missing (%)78.3%75.6%
Memory size7.0 KiB7.0 KiB
2025-01-20T16:51:29.329512image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.70103093.5321101
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters359385
Distinct characters1819
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6674 ?
Unique (%)68.0%67.9%

Sample

 Dataset ADataset B
1st rowC65D49
2nd rowB4F4
3rd rowDE17
4th rowA32E8
5th rowC104F E69
ValueCountFrequency (%)
b96 4
 
3.4%
b98 4
 
3.4%
f2 3
 
2.6%
f 3
 
2.6%
c65 2
 
1.7%
b28 2
 
1.7%
d 2
 
1.7%
b77 2
 
1.7%
g73 2
 
1.7%
b51 2
 
1.7%
Other values (81) 91
77.8%
ValueCountFrequency (%)
g6 3
 
2.4%
f 3
 
2.4%
f33 2
 
1.6%
c22 2
 
1.6%
c26 2
 
1.6%
d33 2
 
1.6%
c124 2
 
1.6%
b18 2
 
1.6%
c23 2
 
1.6%
f2 2
 
1.6%
Other values (92) 103
82.4%
2025-01-20T16:51:29.795061image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
B 39
 
10.9%
2 33
 
9.2%
5 27
 
7.5%
C 27
 
7.5%
3 26
 
7.2%
1 26
 
7.2%
6 24
 
6.7%
9 22
 
6.1%
20
 
5.6%
8 17
 
4.7%
Other values (8) 98
27.3%
ValueCountFrequency (%)
C 43
11.2%
2 41
 
10.6%
1 31
 
8.1%
3 30
 
7.8%
6 29
 
7.5%
B 24
 
6.2%
8 22
 
5.7%
4 20
 
5.2%
D 19
 
4.9%
7 19
 
4.9%
Other values (9) 107
27.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 359
100.0%
ValueCountFrequency (%)
(unknown) 385
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
B 39
 
10.9%
2 33
 
9.2%
5 27
 
7.5%
C 27
 
7.5%
3 26
 
7.2%
1 26
 
7.2%
6 24
 
6.7%
9 22
 
6.1%
20
 
5.6%
8 17
 
4.7%
Other values (8) 98
27.3%
ValueCountFrequency (%)
C 43
11.2%
2 41
 
10.6%
1 31
 
8.1%
3 30
 
7.8%
6 29
 
7.5%
B 24
 
6.2%
8 22
 
5.7%
4 20
 
5.2%
D 19
 
4.9%
7 19
 
4.9%
Other values (9) 107
27.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 359
100.0%
ValueCountFrequency (%)
(unknown) 385
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
B 39
 
10.9%
2 33
 
9.2%
5 27
 
7.5%
C 27
 
7.5%
3 26
 
7.2%
1 26
 
7.2%
6 24
 
6.7%
9 22
 
6.1%
20
 
5.6%
8 17
 
4.7%
Other values (8) 98
27.3%
ValueCountFrequency (%)
C 43
11.2%
2 41
 
10.6%
1 31
 
8.1%
3 30
 
7.8%
6 29
 
7.5%
B 24
 
6.2%
8 22
 
5.7%
4 20
 
5.2%
D 19
 
4.9%
7 19
 
4.9%
Other values (9) 107
27.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 359
100.0%
ValueCountFrequency (%)
(unknown) 385
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
B 39
 
10.9%
2 33
 
9.2%
5 27
 
7.5%
C 27
 
7.5%
3 26
 
7.2%
1 26
 
7.2%
6 24
 
6.7%
9 22
 
6.1%
20
 
5.6%
8 17
 
4.7%
Other values (8) 98
27.3%
ValueCountFrequency (%)
C 43
11.2%
2 41
 
10.6%
1 31
 
8.1%
3 30
 
7.8%
6 29
 
7.5%
B 24
 
6.2%
8 22
 
5.7%
4 20
 
5.2%
D 19
 
4.9%
7 19
 
4.9%
Other values (9) 107
27.8%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing22
Missing (%)0.4%0.4%
Memory size7.0 KiB7.0 KiB
S
319 
C
84 
Q
41 
S
323 
C
84 
Q
37 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters444444
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSQ
3rd rowSS
4th rowSC
5th rowSS

Common Values

ValueCountFrequency (%)
S 319
71.5%
C 84
 
18.8%
Q 41
 
9.2%
(Missing) 2
 
0.4%
ValueCountFrequency (%)
S 323
72.4%
C 84
 
18.8%
Q 37
 
8.3%
(Missing) 2
 
0.4%

Length

2025-01-20T16:51:29.883525image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-01-20T16:51:29.938618image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:29.986293image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
s 319
71.8%
c 84
 
18.9%
q 41
 
9.2%
ValueCountFrequency (%)
s 323
72.7%
c 84
 
18.9%
q 37
 
8.3%

Most occurring characters

ValueCountFrequency (%)
S 319
71.8%
C 84
 
18.9%
Q 41
 
9.2%
ValueCountFrequency (%)
S 323
72.7%
C 84
 
18.9%
Q 37
 
8.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 444
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 319
71.8%
C 84
 
18.9%
Q 41
 
9.2%
ValueCountFrequency (%)
S 323
72.7%
C 84
 
18.9%
Q 37
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 444
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 319
71.8%
C 84
 
18.9%
Q 41
 
9.2%
ValueCountFrequency (%)
S 323
72.7%
C 84
 
18.9%
Q 37
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 444
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 319
71.8%
C 84
 
18.9%
Q 41
 
9.2%
ValueCountFrequency (%)
S 323
72.7%
C 84
 
18.9%
Q 37
 
8.3%

Interactions

Dataset A

2025-01-20T16:51:22.071684image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:24.478867image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:20.550480image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:22.962482image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:20.901430image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.396863image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.261885image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.751487image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.712743image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:24.123593image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:22.138016image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:24.547047image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:20.618472image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.027644image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:20.972220image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.465745image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.336303image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.824026image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.781476image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:24.190841image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:22.210351image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:24.616751image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:20.690552image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.096323image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.046588image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.538862image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.405623image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.894505image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.853324image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:24.263365image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:22.284413image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:24.690935image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:20.763963image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.170223image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.117010image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.607674image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.563380image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.972162image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.928671image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:24.337695image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:22.353302image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:24.763280image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:20.834518image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.237375image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.191226image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:23.680436image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.637517image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:24.048547image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-01-20T16:51:21.998126image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:24.407754image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

Dataset A

2025-01-20T16:51:30.043640image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-01-20T16:51:30.279798image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.135-0.2630.0680.2720.177-0.2130.279
Embarked0.0001.0000.1870.0280.0000.2600.0860.0820.188
Fare0.1350.1871.0000.419-0.0220.4860.1570.4480.251
Parch-0.2630.0280.4191.000-0.0280.0000.2510.4710.140
PassengerId0.0680.000-0.022-0.0281.0000.0000.141-0.0690.160
Pclass0.2720.2600.4860.0000.0001.0000.1200.0990.380
Sex0.1770.0860.1570.2510.1410.1201.0000.1850.549
SibSp-0.2130.0820.4480.471-0.0690.0990.1851.0000.214
Survived0.2790.1880.2510.1400.1600.3800.5490.2141.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.128-0.2790.0280.2720.000-0.1630.043
Embarked0.0001.0000.2130.0000.0000.2530.0440.0520.118
Fare0.1280.2131.0000.397-0.0140.4930.0920.4310.243
Parch-0.2790.0000.3971.000-0.0650.0000.2160.3850.088
PassengerId0.0280.000-0.014-0.0651.0000.0000.000-0.0650.000
Pclass0.2720.2530.4930.0000.0001.0000.1340.1220.318
Sex0.0000.0440.0920.2160.0000.1341.0000.1730.498
SibSp-0.1630.0520.4310.385-0.0650.1220.1731.0000.100
Survived0.0430.1180.2430.0880.0000.3180.4980.1001.000

Missing values

Dataset A

2025-01-20T16:51:22.465463image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2025-01-20T16:51:24.875822image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2025-01-20T16:51:22.556971image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2025-01-20T16:51:24.968941image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2025-01-20T16:51:22.656880image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2025-01-20T16:51:25.063246image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
80580603Johansson, Mr. Karl Johanmale31.00003470637.7750NaNS
41341402Cunningham, Mr. Alfred FlemingmaleNaN002398530.0000NaNS
88688702Montvila, Rev. Juozasmale27.000021153613.0000NaNS
787912Caldwell, Master. Alden Gatesmale0.830224873829.0000NaNS
79279303Sage, Miss. Stella AnnafemaleNaN82CA. 234369.5500NaNS
30730811Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)female17.0010PC 17758108.9000C65C
25926012Parrish, Mrs. (Lutie Davis)female50.000123043326.0000NaNS
20120203Sage, Mr. FrederickmaleNaN82CA. 234369.5500NaNS
52452503Kassem, Mr. FaredmaleNaN0027007.2292NaNC
565712Rugg, Miss. Emilyfemale21.0000C.A. 3102610.5000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
36536603Adahl, Mr. Mauritz Nils Martinmale30.000C 70767.2500NaNS
28929013Connolly, Miss. Katefemale22.0003703737.7500NaNQ
565712Rugg, Miss. Emilyfemale21.000C.A. 3102610.5000NaNS
15515601Williams, Mr. Charles Duanemale51.001PC 1759761.3792NaNC
414202Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott)female27.0101166821.0000NaNS
35035103Odahl, Mr. Nils Martinmale23.00072679.2250NaNS
68168211Hassab, Mr. Hammadmale27.000PC 1757276.7292D49C
87487512Abelson, Mrs. Samuel (Hannah Wizosky)female28.010P/PP 338124.0000NaNC
79479503Dantcheff, Mr. Ristiumale25.0003492037.8958NaNS
36836913Jermyn, Miss. AnniefemaleNaN00143137.7500NaNQ

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
64764811Simonius-Blumer, Col. Oberst Alfonsmale56.0001321335.5000A26C
58858903Gilinski, Mr. Eliezermale22.000149738.0500NaNS
67267302Mitchell, Mr. Henry Michaelmale70.000C.A. 2458010.5000NaNS
43043111Bjornstrom-Steffansson, Mr. Mauritz Hakanmale28.00011056426.5500C52S
15715803Corn, Mr. Harrymale30.000SOTON/OQ 3920908.0500NaNS
44744811Seward, Mr. Frederic Kimbermale34.00011379426.5500NaNS
23123203Larsson, Mr. Bengt Edvinmale29.0003470677.7750NaNS
48648711Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby)female35.0101994390.0000C93S
18618713O'Brien, Mrs. Thomas (Johanna "Hannah" Godfrey)femaleNaN1037036515.5000NaNQ
68568602Laroche, Mr. Joseph Philippe Lemerciermale25.012SC/Paris 212341.5792NaNC

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
41241311Minahan, Miss. Daisy Efemale33.0101992890.0000C78Q
71171201Klaber, Mr. HermanmaleNaN0011302826.5500C124S
30931011Francatelli, Miss. Laura Mabelfemale30.000PC 1748556.9292E36C
43043111Bjornstrom-Steffansson, Mr. Mauritz Hakanmale28.00011056426.5500C52S
49649711Eustis, Miss. Elizabeth Musseyfemale54.0103694778.2667D20C
14714803Ford, Miss. Robina Maggie "Ruby"female9.022W./C. 660834.3750NaNS
86786801Roebling, Mr. Washington Augustus IImale31.000PC 1759050.4958A24S
64164211Sagesser, Mlle. Emmafemale24.000PC 1747769.3000B35C
22522603Berglund, Mr. Karl Ivar Svenmale22.000PP 43489.3500NaNS
545501Ostby, Mr. Engelhart Corneliusmale65.00111350961.9792B30C

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.